Problems in gene clustering based on gene expression data
نویسنده
چکیده
In this work, we assess the suitability of cluster analysis for the gene grouping problem confronted with microarray data. Gene clustering is the exercise of grouping genes based on attributes, which are generally the expression levels over a number of conditions or subpopulations. The hope is that similarity with respect to expression is often indicative of similarity with respect to much more fundamental and elusive qualities, such as function. By formally defining the true gene-specific attributes as parameters, such as expected expression across the conditions, we obtain a well-defined gene clustering parameter of interest, which greatly facilitates the statistical treatment of gene clustering. We point out that genome-wide collections of expression trajectories often lack natural clustering structure, prior to ad hoc gene filtering. The gene filters in common use induce a certain circularity to most gene cluster analyses: genes are points in the attribute space, a filter is applied to depopulate certain areas of the space, and then clusters are sought (and often found!) in the “cleaned” attribute space. As a result, statistical investigations of cluster number and clustering strength are just as much a study of the stringency and nature of the filter as they are of any biological gene clusters. In the absence of natural clusters, gene clustering may still be a worthwhile exercise in data segmentation. In this context, partitions can be fruitfully encoded in adjacency matrices and the sampling distribution of such matrices can be studied with a variety of bootstrapping techniques.
منابع مشابه
خوشهبندی دادههای بیانژنی توسط عدم تشابه جنگل تصادفی
Background: The clustering of gene expression data plays an important role in the diagnosis and treatment of cancer. These kinds of data are typically involve in a large number of variables (genes), in comparison with number of samples (patients). Many clustering methods have been built based on the dissimilarity among observations that are calculated by a distance function. As increa...
متن کاملModification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis
Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...
متن کاملEffect of Cardiac Rehabilitation Program Based on Combined Training on VEGF/Endostatin Gene Expression Ratio in Patients with Acute Coronary Syndrome
Background: Coronary artery disease is one of the most common causes of death in the world. With the increase in the incidence of these diseases, surgical and non-surgical interventions followed by cardiovascular rehabilitation programs have become more important. The process of angiogenesis and improvement of blood flow is considered as one of the therapeutic goals in these patients, and vascu...
متن کاملMolecular study of biofilm gene of sulfate reducing bacteria (SRB) isolated from patients with periodontitis and the effect of aloe vera plant extract on its expression by Real time-PCR method
Background and Aims: Due to the increasing problems and side effects of the use of chemical antibacterial agents as well as antibiotic resistance, this study aimed to evaluate the effects of aloe vera gel on biofilm gene expression of sulfate-reducing bacteria (SRB) isolated from patients with periodontal infection by Real time-PCR method. Materials and Methods: For this study, 100 individu...
متن کاملP-73: Effect of Donor Age on The Expression Stability of GAPDH as A ReferenceGene for Gene Expression Analysis ofEquine Adipose-Derived Mesenchymal Stem Cells
Background: Adipose tissue is a main source for isolation of equine mesenchymal stem cells (MSCs) at different ages. It seems that characteristics of adipose-derived MSCs especially gene expression profile are changing along with age increase. A proper reference gene is required for normalizing data in gene expression analysis by qRT-PCR. This study aimed to evaluate whether GAPDH has a stable ...
متن کاملUsing the Protein-protein Interaction Network to Identifying the Biomarkers in Evolution of the Oocyte
Background Oocyte maturity includes nuclear and cytoplasmic maturity, both of which are important for embryo fertilization. The development of oocyte is not limited to the period of follicular growth, and starts from the embryonic period and continues throughout life. In this study, for the purpose of evaluating the effect of the FSH hormone on the expression of genes, GEO access codes for this...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003